Documentation: Building Effective Documentation and Data Dictionaries

Zahier Nasrudin

About me

  • Current Position: Lead Data Scientist at NIQ

  • Academic Background:

    • Master’s in Data Science
  • Experience:

    • Experience in the FMCG and Market Research industry.

Welcome

  • This is a flexible workshop!
  • Feel free to raise your hand if you have any questions, at any time
  • There will be a dedicated 10-minute Q&A session at the end.

What is documentation

  • Providing clear and comprehensive information, guidelines, prerequisites, and descriptions related to a process, project, or dataset.

    • Serves as a reference for understanding the data, methods, and workflows used.
  • Ensures consistency, reproducibility, and effective collaboration across teams.

Importance of Documentation

  • Understanding: Documentation provides a clear understanding of the processes, what the data is about, the source, and also the structure

  • Consistency: Documentation provides clear guidelines that organize structure, ensuring consistent outcomes.

  • Collaboration: Documentation helps communication among stakeholders/team members and aids in understanding and building upon existing work

Template or Example

  • Introduction

    • Brief overview of the documentation. For example, this documentation provides the guidelines, processes, and challenges during the prediction of house prices using Linear Regression.
  • Goals

    • List the goals of the documentation aims to achieve.
  • Scope

    • Define the scope of the documentation. Presenting it in list format could also be effective

Template or Example (Continue)

  • Rules/Prerequisites

    • List any necessary conditions, rules, requirements, or pre-conditions that must be met before proceeding with the subsequent steps outlined in the documentation.
      • For example, users must have R installed to perform the prediction of house prices.
  • Roles/Responsibilities

    • To outline the roles and responsibilities assigned to team members involved in the project.
      • While it is not mandatory, listing assignments for each team member aims to ensure effective collaboration and clarify individual contributions to the project.

Template or Example (Continue)

  • Processes

    • Step-by-step processes/instructions for each task
  • Challenges

    • Common issues/challenges faced and their solutions
      • Provide insights on problems encountered during the project and how they were resolved.
  • References

    • List any references, files, or sources utilized during this project.

Template or Example (Continue)

  • Data Dictionary
    • Focusing on the column names, definitions/descriptions & attributes
    • Definition of each variable:
      • Column Name: The name of the column

      • Column Type: The type of data stored.

      • Description: What the variable represents.

      • Values: Values inside the data (e.g., ranges or categories of data stored in the column)

Template or Example (Continue)

  • Data Dictionary

Best Practices

  • Clear, simple and concise

    • Aim for clarity in writing to enhance understanding.
  • Use Consistent Formats:

    • Maintain uniform templates for documentation.

    • Ensure standardized naming conventions and definitions.

  • Regularly update documentation

    • Incorporate feedback from users to improve content and usability.